22 research outputs found
Attentive Single-Tasking of Multiple Tasks
In this work we address task interference in universal networks by
considering that a network is trained on multiple tasks, but performs one task
at a time, an approach we refer to as "single-tasking multiple tasks". The
network thus modifies its behaviour through task-dependent feature adaptation,
or task attention. This gives the network the ability to accentuate the
features that are adapted to a task, while shunning irrelevant ones. We further
reduce task interference by forcing the task gradients to be statistically
indistinguishable through adversarial training, ensuring that the common
backbone architecture serving all tasks is not dominated by any of the
task-specific gradients. Results in three multi-task dense labelling problems
consistently show: (i) a large reduction in the number of parameters while
preserving, or even improving performance and (ii) a smooth trade-off between
computation and multi-task accuracy. We provide our system's code and
pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.Comment: CVPR 2019 Camera Read
Deep Extreme Cut: From Extreme Points to Object Segmentation
This paper explores the use of extreme points in an object (left-most,
right-most, top, bottom pixels) as input to obtain precise object segmentation
for images and videos. We do so by adding an extra channel to the image in the
input of a convolutional neural network (CNN), which contains a Gaussian
centered in each of the extreme points. The CNN learns to transform this
information into a segmentation of an object that matches those extreme points.
We demonstrate the usefulness of this approach for guided segmentation
(grabcut-style), interactive segmentation, video object segmentation, and dense
segmentation annotation. We show that we obtain the most precise results to
date, also with less user input, in an extensive and varied selection of
benchmarks and datasets. All our models and code are publicly available on
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.Comment: CVPR 2018 camera ready. Project webpage and code:
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr
CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
We propose a method for annotating videos of complex multi-object scenes with
a globally-consistent 3D representation of the objects. We annotate each object
with a CAD model from a database, and place it in the 3D coordinate frame of
the scene with a 9-DoF pose transformation. Our method is semi-automatic and
works on commonly-available RGB videos, without requiring a depth sensor. Many
steps are performed automatically, and the tasks performed by humans are
simple, well-specified, and require only limited reasoning in 3D. This makes
them feasible for crowd-sourcing and has allowed us to construct a large-scale
dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate
offers 101k instances of 12k unique CAD models placed in the 3D representations
of 20k videos. In comparison to Scan2CAD, the largest existing dataset with CAD
model annotations on real scenes, CAD-Estate has 7x more instances and 4x more
unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on
CAD-Estate for the task of automatic 3D object reconstruction and pose
estimation, demonstrating that it leads to performance improvements on the
popular Scan2CAD benchmark. The dataset is available at
https://github.com/google-research/cad-estate.Comment: Project page: https://github.com/google-research/cad-estat
EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
We present EgoCOL, an egocentric camera pose estimation method for open-world
3D object localization. Our method leverages sparse camera pose reconstructions
in a two-fold manner, video and scan independently, to estimate the camera pose
of egocentric frames in 3D renders with high recall and precision. We
extensively evaluate our method on the Visual Query (VQ) 3D object localization
Ego4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than the
Ego4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR
2023 in the val and test sets, respectively. Our code is publicly available at
https://github.com/BCV-Uniandes/EgoCO
Detection-aided liver lesion segmentation using deep learning
A fully automatic technique for segmenting the liver and localizing its
unhealthy tissues is a convenient tool in order to diagnose hepatic diseases
and assess the response to the according treatments. In this work we propose a
method to segment the liver and its lesions from Computed Tomography (CT) scans
using Convolutional Neural Networks (CNNs), that have proven good results in a
variety of computer vision tasks, including medical imaging. The network that
segments the lesions consists of a cascaded architecture, which first focuses
on the region of the liver in order to segment the lesions on it. Moreover, we
train a detector to localize the lesions, and mask the results of the
segmentation network with the positive detections. The segmentation
architecture is based on DRIU, a Fully Convolutional Network (FCN) with side
outputs that work on feature maps of different resolutions, to finally benefit
from the multi-scale information learned by different stages of the network.
The main contribution of this work is the use of a detector to localize the
lesions, which we show to be beneficial to remove false positives triggered by
the segmentation network. Source code and models are available at
https://imatge-upc.github.io/liverseg-2017-nipsws/ .Comment: NIPS 2017 Workshop on Machine Learning for Health (ML4H